Computational Auditory Scene Analysis and Automatic Speech Recognition
نویسندگان
چکیده
The human auditory system is, in a way, an engineering marvel. It is able to do wonderful things that powerful modern machines find extremely difficult. For instance, our auditory system is able to follow the lyrics of a song when the input is a mixture of speech and musical accompaniments. Another example is a party situation. Usually there are multiple groups of people talking, with laughter, ambient music and other sound sources running in the background. The input our auditory system receives through the ears is a mixture of all these. In spite of such a complex input, we are able to selectively listen to an individual speaker, attend to the music in the background, and so on. In fact this ability of ‘segregation’ is so instinctive that we take it for granted without wondering about the complexity of the problem our auditory system solves. Colin Cherry, in the 1950s, coined the term ‘cocktail party problem’ while trying to describe how our auditory system functions in such an environment [12]. He did a series of experiments to study the factors that help humans perform this complex task [11]. A number of theories have been proposed since then to explain the observations made in those experiments [11,12,70]. Helmhotz had, in the mid-nineteenth century, reflected upon the complexity of this signal by using the example of a ball room setting [22]. He remarked that even though the signal is “complicated beyond conception,” our ears are able to “distinguish all the separate constituent parts of this confused whole.” So how does our auditory system solve the so-called cocktail party problem? Bregman tried to give a systematic account in his seminal 1990 book Auditory Scene Analysis [8]. He calls
منابع مشابه
A computational auditory scene analysis system for speech segregation and robust speech recognition
A conventional automatic speech recognizer does not perform well in the presence of multiple sound sources, while human listeners are able to segregate and recognize a signal of interest through auditory scene analysis. We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, ...
متن کاملSignal Separation Motivated by Human Auditory Perception: Applications to Automatic Speech Recognition
The human auditory system uses a number of well-identified cues to segregate and separate individual sound sources in a complex acoustical environment. For example, researchers in auditory scene analysis have long identified cues such as common onset, correlated fluctuations in instantaneous amplitude and frequency, harmonicity, and common interaural time and amplitude differences as ways of id...
متن کاملChallenge Problem for Computational Auditory Scene Analysis: Understanding Three Simultaneous Speeches
Understanding three simultaneous speeches is proposed as a challenge problem to foster arti cial intelligence, speech and sound understanding or recognition, and computational auditory scene analysis research. Automatic speech recognition under noisy environments is attacked by speech enhancement techniques such as noise reduction and speaker adaptation. However, the signal-to-noise ratio of sp...
متن کاملOn Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis
What is the computational goal of auditory scene analysis? This is a key issue to address in the Marrian information-processing framework. It is also an important question for researchers in computational auditory scene analysis (CASA) because it bears directly on how a CASA system should be evaluated. In this chapter I discuss different objectives used in CASA. I suggest as a main CASA goal th...
متن کاملAuditory Scene Analysis: Computational Models
Human listeners have a remarkable ability to separate a complex mixture of sounds into discrete sources. The processes underlying this ability have been termed ‘auditory scene analysis’ (Bregman 1990; this volume). Recently, an interdisciplinary field known as ‘computational auditory scene analysis’ (CASA) has emerged which aims to develop computer systems that mimic this aspect of hearing (Ros...
متن کامل16 Separation of Speech by Computational Auditory Scene Analysis
The term auditory scene analysis (ASA) refers to the ability of human listeners to form perceptual representations of the constituent sources in an acoustic mixture, as in the well-known ‘cocktail party’ effect. Accordingly, computational auditory scene analysis (CASA) is the field of study which attempts to replicate ASA in machines. Some CASA systems are closely modelled on the known stages o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012